Experiment with asynchrony in multimodal speech communication
نویسندگان
چکیده
The purpose of this study was to examine the delay effects in audiovisual speech perception for natural and synthetic faces. The main focus was on the SYNFACE project, the development of a telephone communication aid for hearing impaired persons. In the experiments, the consequence of temporal displacement of the audio in relation to the visual channel was investigated. The audio channel was natural speech with a vocoder-like distortion to simulate hearing loss. Twelve different experimental conditions were presented to the subjects in two separate sessions. The natural face was tested for audio-leading (negative numbers) as well as audio-lagging (positive numbers) stimuli, whereas the synthetic face was tested only for audio-leading stimuli. Asynchronies examined were 50, 175 and 300 ms. In addition, two reference conditions were examined: synchrony and audio-only. Tests of ANOVA including both faces revealed that neither -300 ms nor 175 ms were significantly better than the audio-only condition, which implies that the final SYNFACE product would not be beneficial for delays of this magnitude. The -50 ms condition, however, did not show significantly lower intelligibility scores than the synchronous condition. Unfortunately, the delay measured in the present SYNFACE prototype is greater than this. It would, therefore, be interesting to investigate asynchronies between -175 ms and -50 ms to see exactly where the intelligibility drops. ANOVA further showed that the effect of the type of face was non-significant, indicating that the quality of the synthetic face is close to that of a natural face. Experiment with asynchrony in multimodal speech communication v The tolerance for audio-lagging delays is larger than for the audio-leading delays, which is verified by a significant decrease in performance as late as at +300 ms (the corresponding audio-leading delay is -175 ms). Even a gain in intelligibility was found for the +50 ms condition compared to synchrony. However, this gain is not significant, and statistical analysis showed that delays within the interval [-50, +175] only have a small effect on the spoken message for the natural face.
منابع مشابه
Viseme-dependent weight optimization for CHMM-based audio-visual speech recognition
The aim of the present study is to investigate some key challenges of the audio-visual speech recognition technology, such as asynchrony modeling of multimodal speech, estimation of auditory and visual speech significance, as well as stream weight optimization. Our research shows that the use of viseme-dependent significance weights improves the performance of state asynchronous CHMM-based spee...
متن کاملMyoelectric signals for multimodal speech recognition
A Coupled Hidden Markov Model (CHMM) is proposed in this paper to perform multimodal speech recognition using myoeletric signals (MES) from the muscles of vocal articulation. MES signals are immune to noise, and words that are acoustically similar manifest distinctly in MES. Hence, they would effectively complement the acoustic data in a multimodal speech recognition system. Research in Audio-V...
متن کاملMultimodal Sentence Intelligibility and the Detection of Auditory-Visual Asynchrony in Speech and Nonspeech Signals: A First Report
The ability to perceive and understand visual-only speech and the benefit experienced from having both auditory and visual signals available during speech perception tasks varies widely in the normal-hearing population. At the present time, little is known about the underlying neural mechanisms responsible for this variability or the possible relationships between multisensory speech perception...
متن کاملAchieving Multimodal Cohesion during Intercultural Conversations
How do English as a lingua franca (ELF) speakers achieve multimodal cohesion on the basis of their specific interests and cultural backgrounds? From a dialogic and collaborative view of communication, this study focuses on how verbal and nonverbal modes cohere together during intercultural conversations. The data include approximately 160-minute transcribed video recordings of ELF interactions ...
متن کاملModelling asynchrony in speech using elementary single-signal decomposition
Although the possibility of asynchrony between different components of the speech spectrum has been acknowledged, its potential effect on automatic speech recogniser performance has only recently been studied. This paper presents the results of continuous speech recognition experiments in which such asynchrony is accommodated using a variant of HMM decomposition. The paper begins with an invest...
متن کامل